DiffLDA: Topic Evolution in Software Projects

نویسندگان

  • Stephen W. Thomas
  • Bram Adams
  • Ahmed E. Hassan
  • Dorothea Blostein
چکیده

Previous research has shown that topics can be automatically discovered in a software project’s source code. Topics are collections of words that co-occur frequently in a text collection and are discovered using topic models such as latent Dirichlet allocation (LDA). Tracking how topics evolve, i.e., grow and spread, over time is useful for supporting software maintenance, comprehension, and re-engineering activities. The evolution of topics is typically recovered by applying LDA to all versions of a project’s source code at once, followed by post processing to map topics across versions. Although this technique works well in applications where each version of the data is completely different, for example in the analysis of conference proceedings, the technique does not work well with source code, which typically changes only incrementally and contains significant duplication across versions. In this paper, we present a new approach, called DiffLDA, for automatically mining topic evolution in source code. The approach addresses LDA’s sensitivity to document duplication by operating on the differences between versions of a source code document, resulting in a more accurate, finer-grained representation of topic evolution. We validate our approach through case studies on simulated data and two open source projects.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New Approach for Solving Software Project Scheduling Problem Using Differential Evolution Algorithm

Software Project Scheduling Problem (SPSP) is one of the most critical issues in developing software. The major factor in completing the software project consistent with planned cost and schedule is implementing accurate and true scheduling. The subject of SPSP is an important topic which in software projects development and management should be considered over other topics and software project...

متن کامل

Evolutionary Development of Software Architectures

Today’s software development projects are confronted with a frequently changing environment: rapidly altering business domains and processes, fast technology evolution, great variety of evolving methods and development processes. Therefore an evolutionary development approach is required particularly for such critical success factor like a system’s software architecture. However, existing speci...

متن کامل

Evolution of Open Source Software Systems – A Large-Scale Investigation

In this paper, the evolution of a large sample of open source software projects will be analysed. The evolution of commercial systems has been an issue that has long been a center of research, thus a coherent theoretical framework of software evolution has been developed and empirically tested. Therefore these results can be used to compare the situation in open source projects to the evolution...

متن کامل

Total Software Process Model Evolution in EPOS

This paper presents a case study of Norwegian banking software house where the objective is to adopt a categorization framework for managing evolution in software projects to identify project profiles and evolution patterns, and to suggest improvements to better support frequent evolutions. Based on an analysis of collected evolution data from an ongoing case study, we elaborate a QIP-inspired ...

متن کامل

Empirical study of software quality evolution in open source projects using agile practices

We analyse the time evolution of two open source Java projects: Eclipse and Netbeans, both developed following agile practices, though to a different extent. Our study is centered on quality analysis of the systems, measured as defects absence, and its relation with software metrics evolution. The two projects are described through a software graph in which nodes are represented by Java files a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010